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Abstract. 

VSIPL and OpenMP are two open standards for portable high per- 
formance computing. VSIPL dehvers optimized single processor perfor- 
mance while OpenMP provides a low overhead mechanism for executing 
thread based parallelism on shared memory systems. Image processing 
is one of the main areas where VSIPL and OpenMP can make a large 
impact. Currently, a large fraction of image processing applications are 
written in the Interpreted Data Language (IDL) environment. The aim 
of this work is to demonstrate that the performance benefits of these new 
standards can be brought to image processing community in a high level 
manner that is transparent to users. To this end, this talk presents a fast, 
FFT based algorithm for performing image convolutions. This algorithm 
has been implemented within the IDL environment using VSIPL (for op- 
timized single processor performance) with added OpenMP directives (for 
parallelism). This work demonstrates that good parallel speedups are at- 
tainable using standards and can be integrated seamlessly into existing 
user environments. 



1. Introduction 

The Vector, Signal and Image Processing Library (VSIPL) [1] is an open stan- 
dard C language Application Programmer Interface (API) that allows portable 
and optimized single processor programs. OpenMP [2] is an open standard 
C/Fortran API that allows portable thread based parallelism on shared mem- 
ory computers. Both of these standards have enormous potential to allow users 
to realize the goal of portable applications that are both parallel and optimized. 

Exploiting these new open standards requires integrating them into existing 
applications as well as using them in new efforts. Image processing is one of the 
key areas where VSIPL and OpenMP can make a large impact. Currently, a 
large fraction of image processing applications are written in the Interpreted 
Data Language (IDL) environment [3]. The goal of this work is to show that 
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Figure 1. Basic 2D Filtering. FFT implementation of 2D 
filtering which performs the mathematical operation: F[x, y) = 
J J P{x', y')I{x — x',y — y')dx'dy' 



it is possible to bring the performance benefits of these new standards to the 
image processing community in a high level manner that is transparent to users. 



2. Approach 

Wide area 2D convolution is a staple of digital image processing (see Figure ^) . 
The advent of large format CCDs makes it possible to literally "pave" with silicon 
the focal plane of an optical sensor. Processing of the large images obtained 
from these systems is complicated by the non-uniform Point Response Function 
(PRF) that is common in wide field of view instruments. This paper presents a 
fast, FFT based algorithm for convolving such images. This algorithm has been 
transparently implemented within IDL environment using VSIPL (for optimized 
single processor performance) with added OpenMP directives (for parallelism). 

The inputs of image convolution with variable PRFs consists of a source im- 
age, a set of PRF images, and a grid which locates the center of each PRF on the 
source image. The output image is the convolution of the input image with each 
PRF linearly weighted by its distance from its grid center. The computational 
basis of this convolution are 2D overlap and add FFTs with interpolation (see 
Figure |2]). Today, typical images sizes are in the millions (2K x 2K) to billions 
(40K X 40K) of pixels. A single PRF is typically thousands of pixels (100 x 100) 
pixels, but can be as small 10 x 10 or as large as the entire image. Over a single 
image a PRF will be sampled as few as once but as many as hundreds of times 
depending on the optical system. 

There are many opportunities for parallelism in this algorithm. The sim- 
plest is to convolve each PRF separately on a different processor and then com- 
bine all the results on a single processor. This approach works well with VSIPL, 
OpenMP and IDL (see Figure At the top level a user passes the inputs into 
an IDL routine which passes pointers to an external C function. Within the C 
function OpenMP forks off multiple threads. Each thread executes its convo- 
lution using VSIPL functions. The OpenMP threads are then rejoined and the 
results are added. Finally a pointer to the output image is returned to the IDL 
environment in the same manner as any other IDL routine. 
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Figure 2. Wide Field Filtering. FFT implementation of 2D fil- 
tering for wide field imaging with multiple point response functions. 
Each portion of image is filtered separately and then recombined using 
the appropriate weights. The equivalent mathematical operation is: 
Fij{x, y) = Wij{x, y)J J Pij{x', y')Iij{x -x',y - y')dx'dy' 
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Figure 3. Layered Software Architecture. The user interacts 
with the top layer which provides high level abstractions for high pro- 
ductivity. Lower layers provide performance via parallel processing and 
high performance kernels. 
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Figure 4. Parallel Performance. Measured speedups of wide field 
2D filtering application on an shared memory parallel system (SGI 
Origin2000). Results indicate linear speedups are achievable using open 
software standards underneath high level programming languages. 

3. Results 

This algorithm was implemented on an SGI Origin 2000 at Boston University [4]. 
This machine consists of 64 300 MHz MIPS 10000 processors with an aggregate 
memory of 16 GBytes. IDL version 5.3 from Research Systems, Inc. was used 
along with SGI's native OpenMP compiler (version 7.3.1) and the TASP VSIPL 
implementation. Implementing the components of the system was the same 
as if each were done separately. Integrating the pieces (IDL/OpenMP/VSIPL) 
was done quickly, although care must be taken to use the latest versions of 
the compilers and libraries. Once implemented the software can be quickly 
ported via Makefile modifications to any system that has IDL, OpenMP, and 
VSIPL (currently these are SGI, HP, Sun, IBM, and Red Hat Linux). We have 
conducted a variety of experiments which show linear speedups using different 
numbers of processors and different image sizes (see Figure [3!|) . Thus, it possible 
to achieve good performance using open standards underneath existing high level 
languages. 
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